Fix snapshot compaction bug #339

richcole-at-amazon · 2016-01-29T19:49:59Z

Closes #320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

Testing Done:

A test program (issue320_test) was constructed that performs mutations
while snapshots are active. issue320_test fails without this bug fix
after 64k writes. It passes with this bug fix. It was run with 200M
writes and passed.

Unit tests were written for the new function that was added to the
code. Make check was run and seen to pass.

Signed-off-by: Richard Cole richcole@amazon.com

x2c3z4 · 2019-03-12T08:05:06Z

Why does this patch not to be merged to the master?
This is a very serious defect.
@richcole-at-amazon @jeffreyadean

Is there any unknown background behind this?

x2c3z4 · 2019-03-14T03:21:31Z

@ghemawat
@costan

kylezh · 2019-03-14T03:30:45Z

@cmumford @pwnall @ghemawat

Please take some time to a look at this patch. LevelDB is a widely used library and this is really a very critical bugfix.

I can reproduce this bug on master branch(9ce3051).

And I think this is the same problem of #375

Thanks!

cmumford · 2019-03-18T15:42:23Z

Just sent an email to @richcole-at-amazon to see if he is able to sign the Google CLA.

Makefile is now deprecated so added reference to issue320_test.cc in CMakeLists.txt.

cmumford

Thanks for the pull request. I've made a few minor comments. After those few changes I'll ask Sanjay to take a look at your changes to version_set.cc as he knows this code much better than I.

Makefile

db/version_set_test.cc

issues/issue320_test.cc

richcole-at-amazon · 2019-03-20T18:55:50Z

Thanks Chris. I'll look at making these changes and updating the pull request. I'm going to try to carve out some time on Thursday.

Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>

cmumford

Just one comment.

cmumford · 2019-03-27T14:53:11Z

issues/issue320_test.cc

+class Issue320 { };
+
+TEST(Issue320, Test) {
+  srandom(0);


@richcole-at-amazon This is failing on Windows with the following message:

error C3861: 'srandom': identifier not found

Can you switch to using std::srand() and std::rand()?

Sure should be straight forward. Will update.

Oh, one more optional thing. I failed to point out, in your PR, code that didn't fully conform to the Google C++ Style Guide. I won't require it for this PR and will fix up the formatting after merging this change. However, if you want Git blame to more accurately identify you as the author of certain lines you're welcome to pull over my formatting fixes from cmumford@783fcff. If not then I'll just land them in a post merge cleanup.

Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in 4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>

ruish · 2019-04-01T06:24:25Z

hi Chris @cmumford,

I was trying to patch your change cmumford@48a9a9a to my local repo.

I first build and test the issue320_test.cc without patching other part of the fix. To my surprise, the test was passed. I expected it to fail without the fix.

Could you please verify that? Should the issue320_test pass without other part of the fix?

Thanks,

Rui

ruish · 2019-04-01T07:06:44Z

BTW, my local repo is one based on this commit: 7b945f2

Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>

cmumford · 2019-04-01T19:49:04Z

Thanks @ruish - I introduced an error in the test when refactoring. I've uploaded a fix to cmumford@9c8d57c.

Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>

cmumford

looks good. I'll fixup the formatting later.

cmumford · 2019-04-01T21:58:16Z

Note. I'm trying to configure Copybara to accept this PR. Please ignore any of those errors.

ruish · 2019-04-02T02:50:50Z

Hi Chris @cmumford ,

LGTM!

I patched your fix in my local repo and verified that the test is passed/failed with/without the original fix, which meets the expectation.

Thanks!

Rui

jsolman · 2019-04-07T19:50:18Z

Using LevelDB built from the master branch of this repository, I was hitting this issue fairly consistently when syncing the Neo (https://neo.org) MainNet Blockchain from the latest chain file available at https://sync.ngd.network. In certain instances where compaction would occur, keys were becoming missing. I have not encountered the issue since using a build that incorporates the changes from this PR.

PiperOrigin-RevId: 243156105

Closes google#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in google#339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in cmumford@4b72cb1 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com> # Conflicts: # CMakeLists.txt

key across multiple files. As reported in Github issue #339, it is incorrect to split the same user key across multiple compacted files since it causes tombstones/newer-versions to be dropped, thereby exposing obsolete data. There was a fix for #339, but it ended up not fully fixing the problem. (It checked for boundary problems in the first level being compacted, but not the second). This problem was revealed by Github issue 887. We now adjust boundaries to avoid splitting user keys in both the first level and the second level. PiperOrigin-RevId: 374921082

key across multiple files. As reported in Github issue google#339, it is incorrect to split the same user key across multiple compacted files since it causes tombstones/newer-versions to be dropped, thereby exposing obsolete data. There was a fix for google#339, but it ended up not fully fixing the problem. (It checked for boundary problems in the first level being compacted, but not the second). This problem was revealed by Github issue 887. We now adjust boundaries to avoid splitting user keys in both the first level and the second level. PiperOrigin-RevId: 374921082

richcole-at-amazon mentioned this pull request Feb 4, 2016

Compaction causes data inconsistency when using snapshots #320

Closed

franksunjin mentioned this pull request Mar 30, 2016

Dose rocksdb have the bug found in leveldb? facebook/rocksdb#993

Closed

cmumford added a commit to cmumford/leveldb that referenced this pull request Mar 18, 2019

Resolved PR google#339 merge conflicts.

6a3656b

Makefile is now deprecated so added reference to issue320_test.cc in CMakeLists.txt.

cmumford reviewed Mar 19, 2019

View reviewed changes

Makefile Outdated Show resolved Hide resolved

db/version_set_test.cc Outdated Show resolved Hide resolved

issues/issue320_test.cc Outdated Show resolved Hide resolved

richcole-at-amazon force-pushed the master branch from cedb600 to 8941e6e Compare March 22, 2019 21:20

cmumford reviewed Mar 27, 2019

View reviewed changes

This was referenced Mar 30, 2019

Rotate compaction output at user key boundary #375

Closed

Key with larger sequence got compacted to elder level, while its smaller sequence incarnation stays in younger level #376

Closed

richcole-at-amazon force-pushed the master branch from 8941e6e to 8646cbd Compare April 1, 2019 17:18

richcole-at-amazon force-pushed the master branch from 8646cbd to 20fb601 Compare April 1, 2019 20:12

cmumford added cla: yes cla: no labels Apr 1, 2019

googlebot removed the cla: no label Apr 1, 2019

cmumford approved these changes Apr 1, 2019

View reviewed changes

This was referenced Apr 7, 2019

Run Blockchain actor on its own thread using a PinnedDispatcher. neo-project/neo#681

Closed

Properly stop importing blocks on system shutdown. neo-project/neo-modules#66

Merged

cmumford merged commit 20fb601 into google:master Apr 12, 2019

cmumford added a commit that referenced this pull request Apr 12, 2019

Merge pull request #339 from richcole-at-amazon:master

7711e76

PiperOrigin-RevId: 243156105

Fullstop000 mentioned this pull request May 6, 2019

Does goleveldb have the snapshot compaction issue ? syndtr/goleveldb#280

Open

originalsouth mentioned this pull request Sep 6, 2019

Isssue 178 patch:Compaction causes previously deleted value to reappear #205

Open

kylezh mentioned this pull request Sep 11, 2019

issue320_test failure #723

Open

superboyiii mentioned this pull request Sep 30, 2019

Use leveldb v1.22 for neo-cli release. neo-project/neo-node#472

Closed

soraphis mentioned this pull request Jan 27, 2020

Negative performances impact from 1.19 to 1.22 #759

Open

kezhuw mentioned this pull request Feb 15, 2020

Compact level files at user key boundary kezhuw/leveldb#12

Open

tynes mentioned this pull request Apr 30, 2020

LevelDB update to v1.22 bcoin-org/bdb#10

Closed

kezhuw mentioned this pull request May 4, 2023

ZOOKEEPER-4541 Ephemeral znode owned by closed session visible in 1 of 3 servers apache/zookeeper#1925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix snapshot compaction bug #339

Fix snapshot compaction bug #339

richcole-at-amazon commented Jan 29, 2016

x2c3z4 commented Mar 12, 2019

x2c3z4 commented Mar 14, 2019

kylezh commented Mar 14, 2019

cmumford commented Mar 18, 2019

cmumford left a comment

richcole-at-amazon commented Mar 20, 2019

cmumford left a comment

cmumford Mar 27, 2019

richcole-at-amazon Mar 27, 2019

cmumford Mar 27, 2019

ruish commented Apr 1, 2019

ruish commented Apr 1, 2019

cmumford commented Apr 1, 2019

cmumford left a comment

cmumford commented Apr 1, 2019

ruish commented Apr 2, 2019

jsolman commented Apr 7, 2019

Fix snapshot compaction bug #339

Fix snapshot compaction bug #339

Conversation

richcole-at-amazon commented Jan 29, 2016

x2c3z4 commented Mar 12, 2019

x2c3z4 commented Mar 14, 2019

kylezh commented Mar 14, 2019

cmumford commented Mar 18, 2019

cmumford left a comment

Choose a reason for hiding this comment

richcole-at-amazon commented Mar 20, 2019

cmumford left a comment

Choose a reason for hiding this comment

cmumford Mar 27, 2019

Choose a reason for hiding this comment

richcole-at-amazon Mar 27, 2019

Choose a reason for hiding this comment

cmumford Mar 27, 2019

Choose a reason for hiding this comment

ruish commented Apr 1, 2019

ruish commented Apr 1, 2019

cmumford commented Apr 1, 2019

cmumford left a comment

Choose a reason for hiding this comment

cmumford commented Apr 1, 2019

ruish commented Apr 2, 2019

jsolman commented Apr 7, 2019